.TH GPTLpr_summary 3 "May, 2020" "GPTL"

.SH NAME
GPTLpr_summary \- Print a statistical summary of region stats across all threads and tasks

.SH SYNOPSIS
.B C/C++ Interface:
.nf
#include <gptl.h>
#include <gptlmpi.h>
int GPTLpr_summary (MPI_Comm comm);
int GPTLpr_summary_file (MPI_Comm comm, char *outfile);
.fi

.B Fortran Interface:
.nf
use gptl
integer gptlpr_summary (integer comm)
integer gptlpr_summary_file (integer comm, character(len=*) outfile)
.fi

.SH DESCRIPTION
Provide max, min, mean, and standard deviation stats for all timed regions across all threads
and tasks. GPTLpr_summary() writes data to a file named
.B timing.summary.
GPTLpr_summary_file() writes the same information to a file specified by input argument outfile.
If PAPI counters were enabled, they are included in the summary.
.P
The computation algorithm uses a binary tree so it scales easily to many thousands of cores
with minimal additional per-core memory. Mean and standard deviation stats use the one-pass 
algorithm described in http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
Mean and standard deviation are across
.B ranks,
where each data point is represented by the maximum time across threads owned by the rank.

.SH ARGUMENTS
.TP
.I comm
-- MPI communicator to gather stats across

.TP
.I outfile
-- Name of output file

.SH RESTRICTIONS
.B GPTLinitialize()
must have been called. Since GPTLpr_summary() and GPTLpr_summary_file use MPI, either must be
called after
.B MPI_Initialize()
and before
.B MPI_Finalize()

.SH RETURN VALUES
On success, this function returns 0. On error, a negative error code is returned and a 
descriptive message printed. 

.SH EXAMPLE OUTPUT
Below is GPTLpr_summary output from a simple 2-rank, 4-thread run with 4 regions timed.
.P
.nf
.if t .ft CW
total ranks in communicator=2
nthreads on rank 0=1
'N' used for mean, std. dev. calcs.: 'ncalls'/'nthreads'
'ncalls': number of times the region was invoked across tasks and threads.
'nranks' is the number of ranks which invoked the region.
mean and std. dev. are across tasks for max time taken by any thread.
wallmax and wallmin are across tasks and threads.

name   ncalls nranks mean_time std_dev wallmax (rank thread) wallmin (rank thread)
total       2      2     0.754   0.639   1.206 (   1      0)   0.303 (   0      0)
region1     4      2     0.250   0.071   0.300 (   0      1)   0.100 (   1      0)
region2     4      2     0.000   0.000   0.000 (   0      0)   0.000 (   1      1)
region3     1      1     1.000   0.000   1.000 (   1      0)   1.000 (   1      0)
.if t .ft P
.fi

.SH NOTES
Building GPTL with MPI enabled means all executables linked with GPTL will require linking 
with the MPI library as well.
.P
C applications must be linked with the 
.B -lm
flag because standard deviation calculations require the square root function.

.SH AUTHOR
Jim Rosinski. With inspiration from Pat Worley.
.SH SEE ALSO
.BR GPTLpr "(3)" 
.BR GPTLpr_file "(3)" 
